Material and idea Source: https://github.com/bupaverse
sc1_data <- read.csv("C:/Users/nikol/Downloads/scenario1.csv", sep=";")
head(sc1_data)
sc1_activitylog <- sc1_data %>%
# rename timestamp variables appropriately
dplyr::rename(start = activity_started,
complete = activity_ended) %>%
# convert timestamps to
convert_timestamps(columns = c("start", "complete"), format = ymd_hms) %>%
activitylog(case_id = "patient",
activity_id = "handling",
resource_id = "employee",
timestamps = c("start", "complete"))
#frequency: absolute, absolute_case, relative, relative_case
print(process_map(sc1_activitylog, type=frequency("absolute")))
NULL
#performance: median/mean, "years"/"semesters"/"quarters"/"months"/"weeks"/"days"/"hours"/"mins"/"secs"
print(process_map(sc1_activitylog, type=performance(median,"secs")))
NULL
resource_map(sc1_activitylog, type=frequency("absolute"))
resource_map(sc1_activitylog, type=performance(median,"secs"))
sc2_data <- read.csv("C:/Users/nikol/Downloads/scenario2.csv", sep=";")
head(sc2_data)
sc2_data %>%
# recode lifecycle variable appropriately
dplyr::mutate(registration_type = forcats::fct_recode(registration_type,
"start" = "started",
"complete" = "completed")) %>%
convert_timestamps(columns = "time", format = ymd_hms) %>%
eventlog(case_id = "patient",
activity_id = "handling",
activity_instance_id = "handling_id",
lifecycle_id = "registration_type",
timestamp = "time",
resource_id = "employee") %>%
to_activitylog() -> sc2_activitylog
sc2_activitylog
# Log of 10 events consisting of:
1 trace
1 case
5 instances of 5 activities
5 resources
Events occurred from 2018-09-20 17:16:02 until 2021-09-20 17:03:14
# Variables were mapped as follows:
Case identifier: patient
Activity identifier: handling
Resource identifier: employee
Timestamps: start, complete
#frequency: absolute, absolute_case, relative, relative_case
print(process_map(sc2_activitylog, type=frequency("absolute")))
NULL
#performance: median/mean, "years"/"semesters"/"quarters"/"months"/"weeks"/"days"/"hours"/"mins"/"secs"
print(process_map(sc2_activitylog, type=performance(median,"secs")))
NULL
resource_map(sc2_activitylog, type=frequency("absolute"))
resource_map(sc2_activitylog, type=performance(median,"secs"))
sc3_data <- read.csv("C:/Users/nikol/Downloads/scenario3.csv", sep=";")
head(sc3_data)
sc3_activitylog <- sc3_data %>%
# recode lifecycle variable appropriately
dplyr::mutate(registration_type = forcats::fct_recode(registration_type,
"start" = "started",
"complete" = "completed")) %>%
convert_timestamps(columns = "time", format = ymd_hms) %>%
eventlog(case_id = "patient",
activity_id = "handling",
activity_instance_id = "handling_id",
lifecycle_id = "registration_type",
timestamp = "time",
resource_id = "employee")
G2;H2;Warningh in validate_eventlog(eventlog) :
The following activity instances are connected to more than one resource: 125,625,1060,1297,1859,2354g
sc3_activitylog
# Log of 12 events consisting of:
1 trace
1 case
6 instances of 6 activities
6 resources
Events occurred from 2004-05-20 17:21:29 until 2012-05-20 17:21:59
# Variables were mapped as follows:
Case identifier: patient
Activity identifier: handling
Resource identifier: employee
Activity instance identifier: handling_id
Timestamp: time
Lifecycle transition: registration_type
#frequency: absolute, absolute_case, relative, relative_case
print(process_map(sc3_activitylog, type=frequency("absolute")))
NULL
#performance: median/mean, "years"/"semesters"/"quarters"/"months"/"weeks"/"days"/"hours"/"mins"/"secs"
print(process_map(sc3_activitylog, type=performance(median,"secs")))
NULL
resource_map(sc3_activitylog, type=frequency("absolute"))
resource_map(sc3_activitylog, type=performance(median,"secs"))
In the examples below, we will use a slightly filtered versions of the traffic_fines data set, which contains 95% of the cases that have the most frequent traces.
tmp <- traffic_fines %>%
filter_trace_frequency(percentage = 0.95)
Looking at the absolute-case process map below, we see that the Payment activity is only executed in 4436 cases. This number is lower than the total number of executions seen above because of the self-loop on the activity.
tmp %>%
process_map(frequency("absolute"))
In relative terms, Payment represents 14.51% of the total activity instances. We can furthermore see that in 94.66% of cases it occurred, it was the end of the case. In the other 5.34% of cases, it was followed by another Payment.
tmp %>%
process_map(frequency("relative"))
Below, we see that Payment occurred in 46.24% of cases. In 2.5% of cases, a Payment activity was followed by another Payment.
tmp %>%
process_map(frequency("relative-case"))
Finally, the relative-consequent map shows us what happens before activities. With respect to Payment, we can see that it was preceded by:
Create Fine (73.15%)
Add Penalty (21.51%)
Payment (5.34%)
Payment itself represents 14.51% of all activity executions.
tmp %>%
process_map(frequency("relative-consequent"))
Instead of a frequencies, process maps can also be used to visualize performance of the process, by using performance() to configure the map, instead of frequency().
There are three different parameters specific to the performance() configuration: the aggregation function, the time units, and the flow time type.
patients %>%
process_map(performance())
The FUN argument specifies the aggregation function to apply on the processing time (e.g. min, max, mean, median, etc.). By default, the mean durations are shown. We can adjust this to the maximum, for example.
patients %>%
process_map(performance(FUN = max))
G2;H2;Warnungh: There was 1 warning in `summarize()`.
ℹ In argument: `label = do.call(...)`.
ℹ In group 10: `ACTIVITY_CLASSIFIER_ = NA` and `from_id = NA`.
Caused by warning in `type()`:
! kein nicht-fehlendes Argument für max; gebe -Inf zurückg
G2;H2;Warnungh: There were 2 warnings in `summarize()`.
The first warning was:
ℹ In argument: `value = do.call(...)`.
ℹ In group 1: `ACTIVITY_CLASSIFIER_ = "ARTIFICIAL_END"`, `next_act = NA`, `from_id = 1`, `to_id = NA`.
Caused by warning in `type()`:
! kein nicht-fehlendes Argument für max; gebe -Inf zurück
ℹ Run ]8;;ide:run:dplyr::last_dplyr_]8;;ide:run:warnings()warnings()]8;;dplyr::last_dplyr_]8;;ide:run:warnings()warnings()]8;;]8;; to see the 1 remaining warning.g
Any function that takes a numerical vector and returns a single value can be used. For example, let’s say we want to show the 0.90 percentile.
p90 <- function(x, ...) {
quantile(x, probs = 0.9, ...)
}
patients %>%
process_map(performance(FUN = p90))
The units argument allows to specify the time units to be used.
patients %>%
process_map(performance(mean, "days"))
patients %>%
process_map(performance(mean, "hours"))
The profile used for nodes and edges can be differentiated using the type_nodes and type_edges attributes instead of the type argument. In this way, information about frequencies and performance, or any other value, can be combined in the same graph.
patients %>%
process_map(type_nodes = frequency("relative_case"),
type_edges = performance(mean))
You can add a second layer of information to both nodes and edges.
patients %>%
process_map(type = frequency("relative_case"),
sec = frequency("absolute"))
Both primary and secondary layers can be differentiated between nodes and edges.
patients %>%
process_map(type_nodes = frequency("relative_case"),
type_edges = performance(units = "hours"),
sec_nodes = frequency("absolute"),
sec_edges = performance(FUN = max, units = "hours"))
G2;H2;Warnungh: There were 2 warnings in `summarize()`.
The first warning was:
ℹ In argument: `value = do.call(...)`.
ℹ In group 1: `ACTIVITY_CLASSIFIER_ = "ARTIFICIAL_END"`, `next_act = NA`, `from_id = 1`, `to_id = NA`.
Caused by warning in `type()`:
! kein nicht-fehlendes Argument für max; gebe -Inf zurück
ℹ Run ]8;;ide:run:dplyr::last_dplyr_]8;;ide:run:warnings()warnings()]8;;dplyr::last_dplyr_]8;;ide:run:warnings()warnings()]8;;]8;; to see the 1 remaining warning.g
Both frequency() and performance() have the argument color_scale and color_edges to customize the colors in the process map:
color_scale: set the color scale to fill the nodes. Can be any of the scales in RColorBrewer::brewer.pal.info. Defaults to PuBu (frequency) or Reds (performance)
color_edges: any single color to apply to the arrows. Can be a named color, hex-code, or a result of rgb. Defaults to dodgerblue4 (frequency) or red4 (performance)
Configuring the colors can be useful to harmonize the process map aesthetics when using differing layers for nodes and edges.
patients %>%
process_map(type_nodes = frequency("relative_case", color_scale = "PuBu"),
type_edges = performance(mean, color_edges = "dodgerblue4"))
Here, we use the patients event log provided by the eventdataR package.
A basic animation with static color and token size:
animate_process(patients)
Default token color, size, or image can be changed as follows:
animate_process(patients, mapping = token_aes(size = token_scale(12), shape = "rect"))
animate_process(patients, mapping = token_aes(color = token_scale("red")))
animate_process(patients, mode = "relative", jitter = 10, legend = "color",
mapping = token_aes(color = token_scale("employee",
scale = "ordinal",
range = RColorBrewer::brewer.pal(7, "Paired"))))
traffic_fines %>%
process_matrix(frequency("absolute"))
traffic_fines %>%
process_matrix(frequency("absolute")) %>%
plot()
traffic_fines %>%
process_matrix(frequency("relative-case"))
traffic_fines %>%
process_matrix(frequency("relative-case")) %>%
plot()
traffic_fines %>%
process_matrix(frequency("relative-antecedent"))
traffic_fines %>%
process_matrix(frequency("relative-antecedent")) %>%
plot()
traffic_fines %>%
process_matrix(frequency("relative-consequent"))
traffic_fines %>%
process_matrix(frequency("relative-consequent")) %>%
plot()
traffic_fines %>%
process_matrix(performance(FUN = mean, units = "weeks"))
traffic_fines %>%
process_matrix(performance(FUN = mean, units = "weeks")) %>%
plot()
sepsis %>%
dotted_chart(x = "absolute")
sepsis %>%
dotted_chart(x = "absolute", sort = "end")
sepsis %>%
dotted_chart(x = "relative")
sepsis %>%
dotted_chart(x = "relative_week",
scale_color = ggplot2::scale_color_discrete)
sepsis %>%
dotted_chart(x = "relative_day")
sepsis %>%
dotted_chart(x = "relative_week")
Different activity sequences in the log can be visualized with trace_explorer(). It can be used to explore frequent as well as infrequent traces.
sepsis %>%
trace_explorer()
G2;H2;Warnungh: No `coverage` or `n_traces` set.
! Defaulting to `coverage` = 0.2 for `type` = "frequent" traces.g
sepsis %>%
trace_explorer(coverage = 0.15)
You can also set the coverage by directly specifying the number of traces to show.
sepsis %>%
trace_explorer(n_traces = 10)
Instead of giving priority to frequent traces, you can show infrequent traces.
sepsis %>%
trace_explorer(n_traces = 10, type = "infrequent")
You can set which metric to include using coverage_labels, as well as change the order.
sepsis %>%
trace_explorer(n_traces = 10,
coverage_labels = c("cumulative", "relative"))
The labels shown on the traces can be configured with the arguments label_size, show_labels and abbreviate. Increasing the label size.
sepsis %>%
trace_explorer(n_traces = 10, label_size = 4)
Removing the labels.
sepsis %>%
trace_explorer(n_traces = 10,
show_labels = FALSE)
Disabling the abbreviation of labels.
sepsis %>%
trace_explorer(n_traces = 10, abbreviate = FALSE)
Set the colors
sepsis %>%
trace_explorer(n_traces = 10,
scale_fill = ggplot2::scale_fill_discrete)
traffic_fines %>%
ps_detailed()
The number of segments shown in the performance spectrum can be configured in two ways. The first is to explicitly state the number of segments to show.
traffic_fines %>%
ps_detailed(n_segments = 10)
traffic_fines %>%
ps_detailed(classification = "resource")
traffic_fines %>%
end_activities("case") %>%
augment(traffic_fines, prefix = "end") %>%
ps_detailed(classification = "end_activity")
The aggregated performance spectrum works in completely the same way.
traffic_fines %>%
end_activities("case") %>%
augment(traffic_fines, prefix = "end") %>%
group_by(end_activity) %>%
ps_aggregated()
Activity presence shows in what percentage of cases an activity is present. It has no level-argument.
patients %>% activity_presence() %>%
plot
patients %>%
activity_frequency("activity")
patients %>%
start_activities("resource-activity")
patients %>%
end_activities("resource-activity")
The trace coverage metric shows the relationship between the number of different activity sequences (i.e. traces) and the number of cases they cover.
patients %>%
trace_coverage("trace") %>%
plot()
The trace length metric describes the length of traces, i.e. the number of activity instances for each case. It can be computed at the levels case, trace and log.
patients %>%
trace_length("log") %>%
plot
The idle time is the time that there is no activity in a case or for a resource. It can only be calculated when there are both start and end timestamps available for activity instances. It can be computed at the levels trace, resource, case and log, and using different time units.
patients %>%
idle_time("resource", units = "days")
The output of all metrics in edeaR can be visualized by supplying it to the plot function.
patients %>%
idle_time("resource", units = "days") %>%
plot()
The processing time can be computed at the levels log, trace, case, activity and resource-activity. It can only be calculated when there are both start and end timestamps available for activity instances.
patients %>%
processing_time("activity") %>%
plot
The throughput time is the time form the very first event to the last event of a case. The levels at which it can be computed are log, trace, or case.
patients %>%
throughput_time("log") %>%
plot()
The resource frequency metric allows the computation of the number/frequency of resources at the levels of log, case, activity, resource, and resource-activity.
patients %>%
resource_frequency("resource")
Resource involvement refers to the notion of the number of cases in which a resource is involved. It can be computed at levels case, resource, and resource-activity.
It this example it shows that only r1 and r2 are involved in all cases, r6 and r7 are involved in most of the cases, while the others are only involved in half of the cases, more or less.
patients %>%
resource_involvement(level = "resource") %>% plot
The resource specalization metric shows whether resources are specialized in certain activities or not. It can be calculated at the levels log, case, resource and activity.
In the simple patients event log, each resource is performing exactly one activity, and is therefore 100% specialized.
patients %>%
resource_specialisation("resource")
A handover-of-work network can be created with the resource_map function. It has the same arguments as the process_map function.
patients %>%
resource_map()
A more compact representation of hand-over-of-work is given by the resource_matrix function, which works the same as the process matrix functions.
patients %>%
resource_matrix() %>%
plot()